A Brief Survey on Semi-supervised Learning with Graph Regularization

نویسنده

  • Hongyuan You
چکیده

In this survey, we go over a few historical literatures on semi-supervised learning problems which apply graph regularization on both labled and unlabeled data to improve classification performance. These semi-supervised methods usually construct a nearest neighbour graph on instance space under certain measure function, and then work under the smoothness assumption that class labels of samples change quite slowly for connected instances on the graph. Graph Lapalcian of the graph has been used in the graph smoothness regularization since it is a discrete analogous to Beltrami Laplace smoothness operator in the continuous space. A relationship between kernel method and graph Laplacian has also been discovered, and suggests an approach to design specific kernel for classification utilizing the spectrum of Laplacian matrix. 1 Smoothness Regularization via Laplace Operator and Graph Laplacian 1.1 Semi-supervised Learning Problem Nowadays there are many practical applications of machine learning problem. Though in academic papers we often assume these problems have some nice conditions and huge number of samples for inference and learning, we cannot achieve such datasets in reality. One of the biggest problem is that not all of sample instances have class labels since people can hardly label millions of instances manually. Then we need some learning method that can deal with the problem with both labeld and unlabled data at the same time, utilizing the labels of labeled instances and also the information from unlabeled instances. Let X be the instance space, {x1, x2, ..., xk} are the k labeled training samples with labels {y1, y2, ..., yk}, yi ∈ [−1,+1], and {xk+1, xk+2, ..., xn} are the n−k unlabeled samples. Usually we have k n which means only very few samples are labeled. Our purpose is learning a classification function f : x → [−1,+1] to predict the class of testing samples. We restrict ourselves to the transductive setting where the unlabelled data also serve as the test data in our performance analysis. 1.2 Manifold Assumption and Laplace Operator The information from unlabeled instance is hard to be incorporated into the classification model unless we could raise a reasonable assumption of the unlabeld data. The first assumption is thatthe data lies on a low-dimensional manifold within a high-dimensional representation space. We can easily find many examples to justify this assumption. For example a handwriting digital number can be represented as a matrix of image pixel with very high dimension, however it can also be represented by only a few parameters in the low-dimensional ambient space, and in this space, the structure of similar handwriting digital numbers lie close to each other to form a low-dimensional manifold. Similar examples can be find in document categorization problems, where people discov-

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples

We propose a family of learning algorithms based on a new form of regularization that allows us to exploit the geometry of the marginal distribution. We focus on a semi-supervised framework that incorporates labeled and unlabeled data in a general-purpose learner. Some transductive graph learning algorithms and standard methods including Support Vector Machines and Regularized Least Squares can...

متن کامل

Manifold Regularization: A Geometric Framework for Learning from Examples

We propose a family of learning algorithms based on a new form of regularization that allows us to exploit the geometry of the marginal distribution. We focus on a semi-supervised framework that incorporates labeled and unlabeled data in a general-purpose learner. Some transductive graph learning algorithms and standard methods including Support Vector Machines and Regularized Least Squares can...

متن کامل

Comparing Diffusion Models for Graph–Based Semi–Supervised Learning

The main idea behind graph-based semi–supervised learning is to use pair–wise similarities between data instances to enhance classification accuracy (see (Zhu, 2005) for a survey of existing approaches). Many graph–based techniques use certain type of regularization that often involve a graph Laplacian operator (e.g., see (Belkin et al., 2006)). Intuitively, this corresponds to a diffusion proc...

متن کامل

Semi-Supervised Learning with Max-Margin Graph Cuts

This paper proposes a novel algorithm for semisupervised learning. This algorithm learns graph cuts that maximize the margin with respect to the labels induced by the harmonic function solution. We motivate the approach, compare it to existing work, and prove a bound on its generalization error. The quality of our solutions is evaluated on a synthetic problem and three UCI ML repository dataset...

متن کامل

Semi-supervised Regression with Order Preferences

Following a discussion on the general form of regularization for semi-supervised learning, we propose a semi-supervised regression algorithm. It is based on the assumption that we have certain order preferences on unlabeled data (e.g., point x1 has a larger target value than x2). Semi-supervised learning consists of enforcing the order preferences as regularization in a risk minimization framew...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014